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IN THE SPECIFICATION 
Please amend paragraphs 80, 81, 84, 87, 92 and 93 of 
the specification as follows: 

[0080] There are many properties of image appearance 
that one could use as data streams from which one could 
learn appearance models for tracking and object search. 
Examples include local color statistics, multiscale filter 
responses, and localized edge fragments. In this work, the 
data streams were derived from responses of a steerable 
filter pyramid is applied (i.e., based on the G 2 and H 2 
filters; see W. Freeman and E. H. Adelson, "The Design and 
Use of Steerable Filters", IEEE Transactions on Pattern 
Analysis and Machine Intelligence, 13:891-906, 1991— 
incorporated herein by reference ) . Steerable pyramids 
provide a description of the image at different scales and 
orientations that is useful for coarse-to-fine differential 
motion estimation, and for isolating stability at different 
scales and at different spatial locations, and different 
image orientations. Here G 2 and H 2 filters are used at two 
scales, tuned to wavelengths of eight and sixteen pixels 
(subsampled by factors of two and four) , with four 
orientations at each scale. 

[0081] From the filter outputs, the present inventors 
chose to maintain a representation of the phase structure 
as the appearance model . This provides a natural degree of 
amplitude and illumination independence, and it provides 
the fidelity for accurate image alignment afforded by 
phase-based methods (see, for example, D.J. Fleet and A.D. 
Jepson, "Stability of Phase Information", IEEE Transactions 
on PAMI, 15 (12) : 1253-1268, 1993-? — incorporated herein by 
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rcf crcncc ) . Phase responses associated with small filter 
amplitudes, or those deemed unstable according to the 
techniques described in the above-cited paper were treated 
as outliers. 

[0084] The motion is represented in terms of frame-to- 
frame parameterized image warps. In particular, given the 
warp parameters c t , a pixel x at frame t - 1 corresponds to 
the image location x t = w(x;c t ) at time t, where w(x;c t ) is 
the warp function. Similarity transforms are used here, so 
c t = (u r> Q tt p t ) is a 4-vector describing translation, 
rotation, and scale changes, respectively. Translations 
are specified in pixels, rotations in radians, and the 
scale parameter denotes a multiplicative factor, so fj = 

(0,0,0,1) is the identity warp. By way of tracking, the 
target neighborhood is convected (i.e. warped) forward at 
each frame by the motion parameters. That is, given the 
parameter vector c t , 3V* is just the elliptical region 
provided by warping !Nt-i by w(x;c t ) . Other parameterized 
image warps, and other parameterized region representations 
could also be used (e.g., see F.G. Meyer and P. Bouthemy, 
"Region-Based Tracking Using Affine Motion Models in Long 
Image Sequences", CVGIP: Image Understanding, 60(2): 119- 
140, 1994- — which ia incorporated herein by reference ) . 

[0087] To estimate c t , the sum of the log-likelihood and 
the log-prior given is maximized by 

E{c t ) = L{D t \ A-i,A-i ? c t ) + logp(ct|c t _i) EQUATION (12) 

To maximize E{c t ) a straightforward variant of the 
expectation-maximization (EM) algorithm is used, as 
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described by A. Jepson and M. J. Black in "Mixture Models 
for Optical Flow Computation", In Proc. IEEE Computer 
Vision and Pattern Recognition, CVPR-93, pages 760-761, New 
York, June 1993-? — which io incorporated herein by reference . 
This is an iterative, coarse-to-fine algorithm, with 
annealing used to control the method becoming trapped in 
local minima. In short, the E-step determines the 

ownership probabilities for the backwards warped data f) ( , 

as in Equation (3) above. The M-step uses these ownerships 
to form a linear system for the update to c t . These 
components of the linear system are obtained from the 
motion constraints weighted by the ownership probabilities 
for the 1^and S processes. 

[0092] In practice, to help avoid becoming stuck in local 
minima, it is useful to apply the EM algorithm with a 
coarse-to-fine strategy and deterministic annealing in 
fitting the motion parameters (e.g., see, for example, A. 
Jepson and M. J. Black, "Mixture Models for Optical Flow 
Computation," Proc. IEEE Computer Vision and Pattern 
Recognition, CVPR-93, pages 760-761, New York, June 1993- 
which io incorporated herein by reference ) . The initial 
guess for the warp parameters is based on a constant 
velocity model, so the initial guess is simply equal to the 
estimated warp parameters from the previous frame. By way 
of annealing, instead of using the variances o 2 ^ and 6* w in 
computing the ownerships and gradients of Equation (22) for 
the S and W components, the parameters o s and o w are used. 
At each iteration of the EM-algorithm, these values are 
decreased according to 
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as <r- mhi(0.9bas,<Ts) 

a w min(0.95cryv,<7yv) EQUATION (24) 

where a B and & w are the maximum likelihood variance 
estimates of the S component and W component phase 
differences, over the entire neighborhood, W t , given the 
motion estimate obtained in the current EM iteration. Once 
the variances reach a minimal value the annealing is turned 
off and they are allowed to fluctuate according to the 
current motion parameters. Moreover, as the variance of 
the S component decreases according to the spatial ensemble 
of data observations at each EM iteration, the variances 
used for each individual observation in computing 
ownerships and likelihood gradients are never allowed to be 
lower than the corresponding variance of o 2 ^. 



[0093] Finally, once the warp parameters c t have been 
determined, the appearance model y? t -i is convected (warped) 
forward to the current time t using the warp specified by 
c t . To perform this warp, a piecewise constant interpolant 
is used for the WSL state variables m(x, t-1) and a s (x,t-l). 
This interpolation was expected to be too crude to use for 
the interpolation of the mean jlz(x, t-1) for the stable 
process, so instead the mean is interpolated using a 
piecewise linear model. The spatial phase gradient for 
this interpolation is determined from the gradient of the 
filter responses at the nearest pixel to the desired 
location x on the image pyramid sampling grid (see D.J. 
Fleet, A.D. Jepson, and M. Jenkin, "Phase-Based Disparity 
Measurement," Computer* Vision and Image Understanding, 
53 (2) :198-210, 1991- — incorporated herein by reference ) . 
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